-
-
Notifications
You must be signed in to change notification settings - Fork 50
Add forked_from
as a pointer either to an existing package or as a reference to a URL
#3363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sources/App/Models/Repository.swift
Outdated
var summary: String? | ||
|
||
@Field(key: "forked_from") | ||
var forkedFrom: Fork? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we're trying (😅) to keep the data fields sorted (although this isn't true generally, because it's something we started half-way through without going back and fixing it everywhere). I'll patch this up real quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the review and fixes. @finestructure could you link the logic for normalizing the repo urls from PackageList? I couldn't find it.
Sources/App/Commands/Ingest.swift
Outdated
} | ||
} catch { | ||
Current.logger().warning("updating forked from failed") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we didn't stick this in updateRepository
like this - the function should really only update repository from parameters passed in (even if the "computation" is rather trivial) and shouldn't have other side effects like running a package query.
Instead, let's make updateRepository
take a parameter:
func updateRepository(on database: Database,
for repository: Repository,
metadata: Github.Metadata,
licenseInfo: Github.License?,
readmeInfo: Github.Readme?,
s3Readme: S3Readme?,
fork: Fork?) async throws {
and move the "fork logic" to
func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? {
guard let parentUrl = metadata.repository?.normalizedParentUrl else { return nil }
if let packageId = try await Package.query(on: database)
.filter(\.$url, .custom("ilike"), parentUrl)
.first()?.id {
return .parentId(packageId)
} else {
return .parentURL(parentUrl)
}
}
which we then use around line 128 before the updateRepository
call:
let fork = try await getFork(on: database, metadata: metadata)
try await updateRepository(on: database,
for: repo,
metadata: metadata,
licenseInfo: license,
readmeInfo: readme,
s3Readme: s3Readme,
fork: fork)
This will also allow us to add a couple of tests specifically for getFork
behaviour like
- returning
nil
on normalisation failure - correct matching when urls differ in case
- covering both the url and the id based lookups
.filter(\Repository.$package.$id == forkedId).first() | ||
|
||
XCTAssertNil(repo?.forkedFrom) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By moving the main functionality to getFork
, these tests become much easier to set up because we don't need to test all of ingest
. Instead we just test getFork
in isolation:
func test_getFork() async throws {
try await Package(id: .id0, url: "https://github.com/foo/parent.git".url, processingStage: .analysis).save(on: app.db)
try await Package(url: "https://github.com/bar/forked.git", processingStage: .analysis).save(on: app.db)
var metadata = Github.Metadata.mock
do { // test lookup when package is in the index
metadata.repository?.parent = .init(url: "https://github.com/foo/parent.git")
let fork = try await getFork(on: app.db, metadata: metadata)
XCTAssertEqual(fork, .parentId(.id0))
}
do { // test lookup when package is in the index but with different case in URL
metadata.repository?.parent = .init(url: "https://github.com/Foo/Parent.git")
let fork = try await getFork(on: app.db, metadata: metadata)
XCTAssertEqual(fork, .parentId(.id0))
}
do { // test lookup when package is not in the index
metadata.repository?.parent = .init(url: "https://github.com/some/other.git")
let fork = try await getFork(on: app.db, metadata: metadata)
XCTAssertEqual(fork, .parentURL("https://github.com/some/other.git"))
}
do { // test lookup when parent url is nil
metadata.repository?.parent = .init(url: nil)
let fork = try await getFork(on: app.db, metadata: metadata)
XCTAssertEqual(fork, nil)
}
}
and then we simply extend a single test where we ensure fork
is written to the db in updateRepository
. For example, I'd modify test_updateRepository_update
and add fork (call on or around line 152):
try await updateRepository(on: app.db,
for: repo,
metadata: md,
licenseInfo: .init(htmlUrl: "license url"),
readmeInfo: .init(etag: "etag",
html: "readme html",
htmlUrl: "readme html url",
imagesToCache: []),
s3Readme: .cached(s3ObjectUrl: "url", githubEtag: "etag"),
fork: .parentURL("https://github.com/foo/bar.git"))
and validate it (around line 168):
XCTAssertEqual(repo.forkedFrom, .parentURL("https://github.com/foo/bar.git"))
This is essentially what determines the normalisation in the |
Thanks! That could be definitely reused. |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Ucanbarlic.
|
226794d
to
e69091c
Compare
Hey @finestructure I incorporated your suggestions, please take a look when u find the time! I also took the URL approach from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only a couple of small things left that are essentially changes to my previously proposed changes.
Sources/App/Commands/Ingest.swift
Outdated
|
||
let fork: Fork? | ||
do { | ||
fork = try await getFork(on: database, metadata: metadata) | ||
} catch { | ||
fork = nil | ||
Current.logger().warning("updating forked from failed") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let fork: Fork? | |
do { | |
fork = try await getFork(on: database, metadata: metadata) | |
} catch { | |
fork = nil | |
Current.logger().warning("updating forked from failed") | |
} | |
let fork = try? await getFork(on: database, metadata: metadata) |
Let's just straight up assign it. getFork
is technically throwing due to query but we could even write try? await Package.query
there, to be honest.
In fact, I think
func getFork(on database: Database, metadata: Github.Metadata) async -> Fork? {
is better, because we're mixing using nil
as an "error" value and throws
at the same time - my bad!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that I'd left another little tweak to getFork
in a follow up comment on the other PR #3355 (comment)
So in total it should be
func getFork(on database: Database, parent: Github.Metadata.Parent?) async -> Fork? {
guard let parentUrl = parent?.normalizedURL else { return nil }
if let packageId = try? await Package.query(on: database)
.filter(\.$url, .custom("ilike"), parentUrl)
.first()?.id {
return .parentId(packageId)
} else {
return .parentURL(parentUrl)
}
}
Sorry for the back and forth!
Sources/App/Commands/Ingest.swift
Outdated
try await repository.save(on: database) | ||
} | ||
|
||
func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? { | |
func getFork(on database: Database, metadata: Github.Metadata) async -> Fork? { |
Sources/App/Commands/Ingest.swift
Outdated
return nil | ||
} | ||
|
||
if let packageId = try await Package.query(on: database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let packageId = try await Package.query(on: database) | |
if let packageId = try? await Package.query(on: database) |
Sources/App/Commands/Ingest.swift
Outdated
} | ||
|
||
private extension URL { | ||
var normalizedParent: Self? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var normalizedParent: Self? { | |
var normalized: Self? { |
as this is for URLs generally.
And I think we should drop the private
and put it in a new file URL+ext.swift
in Core/Extensions
.
@finestructure thanks for the feedback! Here we go again, could you please review again? Thanks! |
Sources/App/Commands/Ingest.swift
Outdated
s3Readme = .error("\(error)") | ||
} | ||
|
||
let fork: Fork? = try? await getFork(on: database, parent: metadata.repository?.parent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a warning here, I'll fix it up real quick.
6a6a79a
to
21d2281
Compare
Supercedes #3355 and implements the first part of #551. UI changes still pending.
@supersonicbyte I was seeing test failures in
test_decode_Metadata_null
andtest_fetchMetadata
which I have fixed by makingParent
optional in the GitHub metadata. I don’t believe thatparent
is a mandatory field supplied by GitHub, so this feels like a reasonable fix to me but could you verify please?Thanks!